Linear Regression Inference

Edward Vytlacil

Examples

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i,\]

where

  • \(Y_i\) is a dummy variable for school enrollment,
  • \(\mbox{Treat}_i\) is a dummy variable for receipt of Progresa, assigned by RCT,
  • \(\mbox{Girl}_i\) is a dummy variable for being a girl.

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\]

Let

  • \(Y_{1i}\) denote potential outcome with treatment,

  • \(Y_{0i}\) potential outcome without treatment.

How to interpret:

  • \(\beta\)?
  • \(\epsilon_i\)?

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\]

  • CATE for boys: \(~~\mathbb{E}[Y_{1i}-Y_{0i} \mid \mbox{Boy}] = \beta_1\),
  • CATE for girls: \(~~\mathbb{E}[Y_{1i}-Y_{0i} \mid \mbox{Girl}]= \beta_1 + \beta_3\),
  • ATE: \(~~\mathbb{E}[Y_{1i}-Y_{0i}] = \beta_1+\Pr[\mbox{Girl}_i=1] \cdot \beta_3.\)

How to justify this interpretation of coefficients?

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\]

Gender difference:

  • without treatment: \(~~ \beta_2\),

  • with treatment: \(~~ \beta_2 + \beta_3\),

  • overall: \(~~ \beta_2 + \Pr[\mbox{Treat}_i=1] \cdot \beta_3\).

How to justify this interpretation of coefficients?

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\]

OLS estimates in this example are equivalent to sample means, diff in sample means, and diff in diff of sample means.

  • \(\widehat \beta_0\) is equivalent to sample mean of \(Y\) among untreated boys,
  • \(\widehat \beta_1\), \(\widehat \beta_2\) are equivalent to differences in sample means,
  • \(\widehat \beta_3\) is equivalent to diff-in-diff in sample means.

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\]

OLS estimates in this example are equivalent to sample means, diff in sample means, and diff in diff of sample means.

  • How to show?

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\]

OLS estimates in this example are equivalent to sample means, diff in sample means, and diff in diff of sample means.

  • Estimate
    • CATE for boys: \(~~\widehat \beta_1\),
    • CATE for girls: \(~~\widehat \beta_1 + \widehat \beta_3\),
    • ATE: \(~~\widehat \beta_1 + \overline{\mbox{Girl}} \cdot \widehat \beta_3\).

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\]

OLS estimates in this example are equivalent to sample means, diff in sample means, and diff in diff of sample means.

  • Estimate average gender difference
    • without treatment: \(~~\widehat \beta_2\),
    • with treatment: \(~~\widehat \beta_2 + \widehat \beta_3\),
    • overall: \(~~\widehat \beta_2 + \overline{\mbox{Treat}} \cdot \widehat \beta_3\).

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\] Consider testing null of zero average effect for

  • boys, \(~H_0 : \beta_1 = 0\),
  • girls, \(~H_0: \beta_1 + \beta_3 = 0\),
  • boys or girls: \(~H_0: \beta_1 =0\) and \(\beta_3=0\),
  • on average: \(~H_0: \beta_1+\Pr[\mbox{Girl}_i=1] \cdot \beta_3=0\),
  • no gender difference in average effect: \(~H_0 : \beta_3 = 0\).

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\] Consider asymptotic CI for

  • CATE for boys, \(~~\beta_1\),
  • CATE for girls, \(~~\beta_1 + \beta_3\),
  • ATE: \(~~\beta_1+\Pr[\mbox{Girl}_i=1] \cdot \beta_3\),
  • gender difference in average effects: \(~~\beta_3\).

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\] Likewise estimates, testing nulls, and confidence intervals for gender differences.

Example 1: Progresa

In these examples, some parameters

  • are single elements of \(\beta\),
    • e.g., CATE for boys, \(~~\beta_1\),
  • are more generally linear functions of \(\beta\),
    • e.g., CATE for girls, \(~~\beta_1 + \beta_3\),
  • are nonlinear functions \((\beta,\mathbb{E}(X))\),
    • e.g., ATE: \(~~\beta_1+\Pr[\mbox{Girl}_i=1] \cdot \beta_3=0\).

Estimation/CIs for each case?

Example 1: Progresa

In these examples, some tests are of

  • single linear restriction on \(\beta\),
    • e.g., no effect on average for boys.
  • multiple linear restrictions on \(\beta\),
    • e.g., no effect on average for boys or for girls,
  • are linear restrictions on \((\beta,\mathbb{E}(X))\),
    • e.g., no effect on average.

Testing for each case?

Example 2: Cost Function (Nerlove 1963)

Consider the following cost function for electric companies:

\[\begin{multline*} \log C_i = \beta_0 + \beta_1 \log Q_i + \beta_2 \log PL_i +\\ \beta_3 \log PK_i + \beta_4 \log PF_i + \epsilon_i.\end{multline*}\]

  • \(C_i\) is total cost,

  • \(Q_i\) is output,

  • \(PL_i\) is unit price of labor,

  • \(PK_i\) is unit price of capital,

  • \(PF_i\) is unit price of fuel.

Example 2: Cost Function (Nerlove 1963)

Consider the following cost function for electric companies:

\[\begin{multline*} \log C_i = \beta_0 + \beta_1 \log Q_i + \beta_2 \log PL_i +\\ \beta_3 \log PK_i + \beta_4 \log PF_i + \epsilon_i.\end{multline*}\]

How does entering outcome and covariates in logs change interpretation of coefficients?

Example 2: Cost Function (Nerlove 1963)

Consider the following cost function for electric companies:

\[\begin{multline*} \log C_i = \beta_0 + \beta_1 \log Q_i + \beta_2 \log PL_i +\\ \beta_3 \log PK_i + \beta_4 \log PF_i + \epsilon_i.\end{multline*}\] Consider

\[~H_0 : \beta_2+ \beta_3 + \beta_4 =1, ~~~ \mbox{vs}~~~H_1: \beta_2+ \beta_3 + \beta_4 \ne 1.\]

  • What is economic meaning of \(H_0\)?

  • \(H_0\) is a linear restriction on \(\beta\), how to test?

Example 3: Mincer Wage Equation

Mincer wage equation (Mincer 1958):

\[\ln(wage)_i = \beta_0 + \beta_{1} \texttt{educ}_i + \beta_2 \mbox{exp}_i + \beta_3 \mbox{exp}_i^2 + \epsilon_i.\]

  • How does entering outcome variable in logs change interpretation of coefficients?

  • Model linear in parameters (\(\beta\)s) but non-linear in experience.

  • Expect \(\beta_2>0\), \(\beta_3<0\), concave function of experience.

Example 3: Mincer Wage Equation

Mincer wage equation (Mincer 1958):

\[\ln(wage)_i = \beta_0 + \beta_{1} \texttt{educ}_i + \beta_2 \mbox{exp}_i + \beta_3 \mbox{exp}_i^2 + \epsilon_i.\]

  • Marginal effect of experience: \(\beta_2 + 2 \beta_3 \mbox{exp}_i\).

    • marg. effect if \(\mbox{exp}_i=0\): \(~~~\beta_2~\);

    • marg. effect if \(\mbox{exp}_i=10\): \(~~~\beta_2 + 20~ \beta_3\);

    • avg marg. effect: \(~~~\beta_2 + 2\beta_3 E[ \mbox{exp}_i]\).

Example: Mincer Wage Equation

Mincer wage equation (Mincer 1958):

\[\ln(wage)_i = \beta_0 + \beta_{1} \texttt{educ}_i + \beta_2 \mbox{exp}_i + \beta_3 \mbox{exp}_i^2 + \epsilon_i.\]

  • Marginal effect of experience: \(\beta_2 + 2 \beta_3 \mbox{exp}_i\).

    • Supposing \(\beta_2>0\), \(\beta_3<0\), experience level that maximizes log wage: \[- \frac{\beta_2}{2 \beta_3}.\]

Example: Mincer Wage Equation

Mincer wage equation (Mincer 1958):

\[\ln(wage)_i = \beta_0 + \beta_{1} \texttt{educ}_i + \beta_2 \mbox{exp}_i + \beta_3 \mbox{exp}_i^2 + \epsilon_i.\]

  • Estimate average effect of experience by \[\widehat \beta_{2,N} + 2 \cdot \widehat \beta_{3,N} \cdot \overline{\mbox{exp}}_N,\]

    • Estimator is a non-linear function of \((\widehat \beta, \overline{X})\).

Example: Mincer Wage Equation

Mincer wage equation (Mincer 1958):

\[\ln(wage)_i = \beta_0 + \beta_{1} \texttt{educ}_i + \beta_2 \mbox{exp}_i + \beta_3 \mbox{exp}_i^2 + \epsilon_i.\] - Estimate experience level that maximizes log wage by \[- \frac{\widehat \beta_{2,N}}{2 \widehat \beta_{3,N}}.\]

  • Estimator is a non-linear functions of \(\widehat \beta.\)

Implement Mincer Wage Equation

library (AER)
data(CPS1985)
df  <- CPS1985[ , c("wage","education","experience")]
mean.exp <-mean(df$experience) 
mean.exp
reg.1 <- lm(I(log(wage))~education+experience+I(experience^2),data=df)
reg.1
a <- c(0,0,1,2*mean.exp)
avg.eff.exp <- as.numeric(t(a)%*%coef(reg.1)) 
maxexp <- abs(coef(reg.1)[3]/(2*coef(reg.1)[4]))
  • Consider estimation using CPS 1985 data.

    • available in package AER.

    • after loading AER, see ?CPS1985 for description of data.

Implement Mincer Wage Equation

library (AER)
data(CPS1985)
df  <- CPS1985[ , c("wage","education","experience")]
mean.exp <-mean(df$experience) 
mean.exp
reg.1 <- lm(I(log(wage))~education+experience+I(experience^2),data=df)
reg.1
a <- c(0,0,1,2*mean.exp)
avg.eff.exp <- as.numeric(t(a)%*%coef(reg.1)) 
maxexp <- abs(coef(reg.1)[3]/(2*coef(reg.1)[4]))
[1] "Mean Experience: 17.82"

Call:
lm(formula = I(log(wage)) ~ education + experience + I(experience^2), 
    data = df)

Coefficients:
    (Intercept)        education       experience  I(experience^2)  
      0.5203218        0.0897561        0.0349403       -0.0005362  

Implement Mincer Wage Equation

mean.exp <-mean(df$experience) 
mean.exp
reg.1 <- lm(I(log(wage))~education+experience+I(experience^2),data=df)
reg.1
a <- c(0,0,1,2*mean.exp)
avg.eff.exp <- as.numeric(t(a)%*%coef(reg.1)) 
maxexp <- abs(coef(reg.1)[3]/(2*coef(reg.1)[4]))
avg.eff.exp
maxexp
[1] "Estimated Avg Effect Experience: 0.0158"
[1] "Estimated Experience that Maximizes Wage: 32.6"

Implement Mincer Wage Equation

mean.exp <-mean(df$experience) 
mean.exp
reg.1 <- lm(I(log(wage))~education+experience+I(experience^2),data=df)
reg.1
a <- c(0,0,1,2*mean.exp)
avg.eff.exp <- as.numeric(t(a)%*%coef(reg.1)) 
maxexp <- abs(coef(reg.1)[3]/(2*coef(reg.1)[4]))
avg.eff.exp
maxexp
  • \(\overline{\mbox{exp}}_N=\) 17.8

  • \(\widehat \beta_{2,N} =\) 0.0349

  • \(\widehat \beta_{3,N} =\) -0.000536

  • \(\widehat \beta_{2,N} + 2 \widehat \beta_{3,N} \overline{\mbox{exp}}_N=\) 0.0158

  • \(\left | \frac{\widehat \beta_{2,N}}{2 \widehat \beta_{3,N}}\right| =\) 32.6

Motivation: Mincer Wage Equation

  • \(\widehat \beta_{2,N} + 2 \widehat \beta_{3,N} \overline{\mbox{exp}}_N=\) 0.0158

  • \(\left | \frac{\widehat \beta_{2,N}}{2 \widehat \beta_{3,N}}\right| =\) 32.6

  • How to compute standard errors? construct confidence intervals? perform hypothesis tests?

Linear Combination of Coefficients

Recall Asymptotic Distribution of OLS

Under regularity conditions,
\[\sqrt{N} \left( \widehat \beta_N - \beta \right) \stackrel{d}{\rightarrow} Z,~~~Z \sim N(0,\Sigma),\] which by CMT implies that, for any continuous function \(g\), \[g\left(\sqrt{N} \left( \widehat \beta_N - \beta \right)\right) \stackrel{d}{\rightarrow} g(Z),~~~Z \sim N(0,\Sigma),\] and thus, for \(a\) any \((K+1)\times 1\) vector of constants, \[\sqrt{N} \left( a^{\prime} \widehat \beta_N - a^{\prime} \beta \right) \stackrel{d}{\rightarrow} N(0, a^{\prime} \Sigma a).\]

Standard Errors for Linear Combination of \(\beta\)

\[\sqrt{N} \left( a^{\prime} \widehat \beta_N - a^{\prime} \beta \right) \stackrel{d}{\rightarrow} N(0, a^{\prime} \Sigma a).\]

  • Let \(\widehat \Sigma_N\) denote a consistent estimator of \(\Sigma\), \[\widehat \Sigma_N \stackrel{p}{\rightarrow} \Sigma ~~\Rightarrow ~~ a^{\prime} \widehat \Sigma_N a \stackrel{p}{\rightarrow} a^{\prime} \Sigma a.\]

    \[ \mbox{s.e.}(a^{\prime} \widehat \beta_N) = \widehat \omega_N / \sqrt{N} ~~\mbox{where}~~ \widehat \omega^2_N = a^{\prime} \widehat \Sigma_N a \]

Test Null for Linear Combination of \(\beta\).

Consider \[H_0: a^{\prime} \beta = a^{\prime} b,~~~\mbox{vs}~~ H_1: a^{\prime} \beta \ne a^{\prime} b.\]

  • Define test statistic: \[T_N = \frac{a^{\prime} \widehat \beta_N - a^{\prime} b}{\mbox{s.e.}(a^{\prime} \widehat \beta_N)} .\]

Test Null for Linear Combination of \(\beta\).

\(T_N \stackrel{d}{\rightarrow} N(0,1)\) under \(H_0\).

For test at (asymptotic) level \(\alpha\),

  • reject \(H_0\) if \(| T_N | \ge c_{1-\alpha/2}\) where \(c_{1-\alpha/2} = \Phi^{-1}(1-\alpha/2)\).

  • equivalently, define \(p = 2 (1- \Phi(| T_N| ))\), reject null if \(p \le \alpha\).

CI for Linear Combination of \(\beta\)

Define \[T_N(a^{\prime} b) = \frac{a^{\prime} \widehat \beta_N - a^{\prime} b}{\mbox{s.e.}(a^{\prime} \widehat \beta_N)} .\]

Invert (asymptotic) level \(\alpha\) test to form \(1-\alpha\) asymptotic CI: \[ \widehat C_N = \left \{a^{\prime} b: | T_N(a^{\prime} b) |\leq c_{1-\alpha/2}\right \} \\ = \left[ a^{\prime} \widehat\beta_N - c_{1-\alpha/2} \times \mbox{s.e.}(a^{\prime} \widehat \beta_N), ~a^{\prime} \widehat\beta_N + c_{1-\alpha/2} \times \mbox{s.e.}(a^{\prime} \widehat \beta_N) \right].\]

CI for Linear Combination of \(\beta\)

\[ \widehat C_N = \left \{a^{\prime} b: | T_N(a^{\prime} b) |\leq c_{1-\alpha/2}\right \}\]

\[\Rightarrow \Pr \left (a^{\prime} \beta \in \widehat{C}_N\right ) = \Pr\left (| T_N(a^{\prime} \beta)| \leq c_{1-\alpha/2}\right) \rightarrow 1-\alpha,\]

  • Thus \(\widehat C_N\) is asymptotic \(1-\alpha\) CI on \(a^{\prime} \beta\).

CI for Linear Combination of \(\beta\)

\[ \widehat C_N = \left \{a^{\prime} b: |T_N(a^{\prime} b) |\leq c_{1-\alpha/2}\right \}\]

\[\Rightarrow \Pr\left (a^{\prime} \beta \in \widehat{C}_N\right ) = \Pr\left (| T_N(a^{\prime} \beta ) | \leq c_{1-\alpha/2}\right) \rightarrow 1-\alpha,\]

  • Can use \(\widehat C_N\) to test null \(H_0: a^{\prime} \beta = a^{\prime} b\) for any \(a^{\prime} b\).

    • Reject \(H_0: a^{\prime} \beta = a^{\prime} b\) for any \(a^{\prime} b \not \in \widehat{C}_N\).

    • Fail to reject \(H_0: a^{\prime} \beta = a^{\prime} b\) for any \(a^{\prime} b \in \widehat{C}_N\).

Digression: Failure to Reject \(H_0\) as Evidence \(H_0\) is True

A common misconception is to interpret acceptance of the null hypothesis as evidence that the null is true.

  • Such an interpretation is only warranted if the power of the test against alternatives of interest is high.
  • If the test has low power against an alternative of interest, then it is unlikely to reject the false null when that alternative it true, and we should expect the test to incorrectly accept the null even when that alternative is true instead.

Digression: Failure to Reject \(H_0\) as Evidence \(H_0\) is True

  • CI shows range of null hypotheses that are not rejected.

  • If estimating effect, how to interpret evidence if:

    • CI includes 0 but also values far from 0?

    • CI includes 0 and only values close to 0?

      • Sometimes called “precisely estimated 0” in applied work.

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\]

  • \(\mbox{CATE for girls:}~ \beta_1 + \beta_3 = a^{\prime}\beta,\)

  • \(\mbox{Estimated CATE for girls:}~ \widehat \beta_1 + \widehat \beta_3 = a^{\prime}\widehat \beta,\)

with \(a = (0,1,0,1)^{\prime}\). \(\mbox{s.e.}(a^{\prime} \widehat \beta_N) = \widehat \omega_N / \sqrt{N}\) where \[ \widehat \omega^2_N = a^{\prime} \widehat \Sigma_N a =\widehat \sigma^2_{2}+\widehat \sigma^2_{4}+2~\widehat \sigma_{24}\]

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\]

\[\widehat \beta_1 + \widehat \beta_3 = a^{\prime}\widehat \beta,~~~ a = (0,1,0,1)^{\prime}.\]

\[ \mbox{s.e.}(a^{\prime} \widehat \beta_N) = \widehat \omega_N / \sqrt{N} = \sqrt{\frac{\widehat \sigma^2_{2} + \widehat \sigma^2_{4} + 2 ~ \widehat \sigma_{2,4}}{N}}. \]

Consider null of zero average effect for girls, \(H_0 : \beta_1 + \beta_3=0\) against 2-sided alternative.

Example 1: Progresa

\[T_N = \biggl | \frac{a^{\prime} \widehat \beta_N}{\mbox{s.e.}(a^{\prime} \widehat \beta_N)} \biggr |,~~~~p =2 (1- \Phi(T_N)),\] \[\widehat C_N = \left[ a^{\prime} \widehat\beta_N - c_{1-\frac{\alpha}{2}} \cdot \mbox{s.e.}(a^{\prime} \widehat \beta_N), a^{\prime} \widehat\beta_N+ c_{1-\frac{\alpha}{2}} \cdot \mbox{s.e.}(a^{\prime} \widehat \beta_N) \right].\]

Reject \(H_0: \beta_1 + \beta_3=0\) if

  • \(T_N \ge c_{1-\alpha/2}\), where \(~c_{1-\alpha/2} = \Phi^{-1}(1-\alpha/2)\).

  • equivalently, if \(p \le \alpha\),

  • equivalently, if \(0 \not \in \widehat C_N\).

Example 1: Progresa

\[T_N = \biggl | \frac{a^{\prime} \widehat \beta_N}{\mbox{s.e.}(a^{\prime} \widehat \beta_N)} \biggr |,~~~~p =2 (1- \Phi(T_N)),\] \[\widehat C_N = \left[ a^{\prime} \widehat\beta_N - c_{1-\frac{\alpha}{2}} \cdot \mbox{s.e.}(a^{\prime} \widehat \beta_N), a^{\prime} \widehat\beta_N+ c_{1-\frac{\alpha}{2}} \cdot \mbox{s.e.}(a^{\prime} \widehat \beta_N) \right].\]

  • Reject \(H_0: \beta_1 + \beta_3=t_0\) for any \(t_0 \not \in \widehat C_N\).

  • Fail to Reject \(H_0: \beta_1 + \beta_3=t_0\) for any \(t_0 \in \widehat C_N\).

Example 1: Progresa

library(sandwich) # includes vcovHC
library(lmtest) # includes coeftest
reg.1 <- lm(school ~ treat * girl, df) # run regression
reg.test.1 <- coeftest(reg.1,vcov=vcovHC) # heterosckedastic-robust inference

Note that coeftest and coefci use HC3 by default, can change to HC2 (also good properties) or HC1 (most common in applied economics) if desired.

Example 1: Progresa

library(sandwich) # includes vcovHC
library(lmtest) # includes coeftest
reg.1 <- lm(school ~ treat * girl, df) # run regression
reg.test.1 <- coeftest(reg.1,vcov=vcovHC) # heterosckedastic-robust inference
reg.test.1 #heterosckedastic-robust inference

t test of coefficients:

              Estimate Std. Error  t value              Pr(>|t|)    
(Intercept)  0.8096242  0.0073605 109.9960 < 0.00000000000000022 ***
treat        0.0328038  0.0090176   3.6378             0.0002759 ***
girl        -0.0251258  0.0107449  -2.3384             0.0193799 *  
treat:girl   0.0127312  0.0131906   0.9652             0.3344748    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 1: Progresa

library(sandwich) # includes vcovHC
library(lmtest) # includes coeftest
reg.1 <- lm(school ~ treat * girl, df) # run regression
reg.test.1 <- coeftest(reg.1,vcov=vcovHC) # heterosckedastic-robust inference
coefci(reg.1,   level=.95, vcov = vcovHC) # heterosckedastic-robust inference
                  2.5 %       97.5 %
(Intercept)  0.79519671  0.824051618
treat        0.01512820  0.050479382
girl        -0.04618711 -0.004064484
treat:girl  -0.01312403  0.038586445

Example 1: Progresa

library(sandwich) # includes vcovHC
library(lmtest) # includes coeftest
reg.1 <- lm(school ~ treat * girl, df) # run regression
reg.test.1 <- coeftest(reg.1,vcov=vcovHC) # heterosckedastic-robust inference
vcovHC(reg.1) 
               (Intercept)          treat           girl     treat:girl
(Intercept)  0.00005417675 -0.00005417675 -0.00005417675  0.00005417675
treat       -0.00005417675  0.00008131703  0.00005417675 -0.00008131703
girl        -0.00005417675  0.00005417675  0.00011545281 -0.00011545281
treat:girl   0.00005417675 -0.00008131703 -0.00011545281  0.00017399259
  • Reporting \(\widehat \Sigma / N\).
  • How to get s.e. on estimated avg effect for girls, \(\mbox{s.e.}(\widehat \beta_1 + \widehat \beta_3)\)? Inference for avg effect on girls? CI?

Example 1: Progresa

reg.1 <- lm(school ~ treat * girl, df) # run regression
reg.test.1 <- coeftest(reg.1,vcov=vcovHC) # HC3 by default heterosckedastic-robust inference
vcovHC(reg.1) 
a <- c(0,1,0,1)
se.girls <- as.numeric(sqrt(t(a)%*%vcovHC(reg.1)%*%a))
se.girls
[1] 0.009626815

Example 1: Progresa

reg.1 <- lm(school ~ treat * girl, df) # run regression
reg.test.1 <- coeftest(reg.1,vcov=vcovHC) # heterosckedastic-robust inference
vcovHC(reg.1) 
a <- c(0,1,0,1)
se.girls <- as.numeric(sqrt(t(a)%*%vcovHC(reg.1)%*%a))
se.girls
estimate.girls <- t(a)%*%summary(reg.1)$coefficients[,1]
tstat <- as.numeric(estimate.girls / se.girls)
pvalue.girls <- 2 * (1 - pnorm(abs(tstat)))
pvalue.girls 
               [,1]
[1,] 0.000002245008

Reject null of no effect on avg. for girls at level \(0.05\) since p-value \(\le 0.05\).

Example 1: Progresa

reg.1 <- lm(school ~ treat * girl, df) # run regression
reg.test.1 <- coeftest(reg.1,vcov=vcovHC) # heterosckedastic-robust inference
vcovHC(reg.1) 
a <- c(0,1,0,1)
se.girls <- as.numeric(sqrt(t(a)%*%vcovHC(reg.1)%*%a))
se.girls
estimate.girls <- t(a)%*%summary(reg.1)$coefficients[,1]
crit <- qnorm(.975)
CI.lower <- estimate.girls - crit * se.girls
CI.upper <- estimate.girls + crit * se.girls
print(paste0("CI:(",round(CI.lower,digits=4),",",round(CI.upper,digits=4),")"))
[1] "95% CI:(0.0267,0.0644)"

Which values for effect on girls can we reject at 95% level?

Example 2: Cost Function (Nerlove 1963)

\[\begin{multline*} \log C_i = \beta_0 + \beta_1 \log Q_i + \beta_2 \log PL_i +\\ \beta_3 \log PK_i + \beta_4 \log PF_i + \epsilon_i,\end{multline*}\]

\[~H_0 : \beta_2+ \beta_3 + \beta_4 =1, ~~~ \mbox{vs}~~~H_1: \beta_2+ \beta_3 + \beta_4 \ne 1.\] Restate as: \[~H_0 : a'\beta =1, ~~~ \mbox{vs}~~~H_1: a'\beta \ne 1,\] for \(a = (0,0,1,1,1)^{\prime}\).

Example 2: Cost Function (Nerlove 1963)

\[H_0 : a'\beta =1, ~~ \mbox{vs}~~H_1: a'\beta \ne 1,~~ \mbox{with} ~~a = (0,0,1,1,1)^{\prime}.\] \[T_N = \biggl | \frac{a^{\prime} \widehat \beta_N -1}{\mbox{s.e.}(a^{\prime} \widehat \beta_N)} \biggr |,~~~~p =2 (1- \Phi(T_N)),\\ \mbox{s.e.}(a^{\prime} \widehat \beta_N) = \widehat \omega_N / \sqrt{N}~~~\mbox{with} ~~~ \widehat \omega^2_N = a^{\prime} \widehat \Sigma_N a.\]

Reject \(H_0\) if

  • \(T_N \ge c_{1-\alpha/2}\), where \(~c_{1-\alpha/2} = \Phi^{-1}(1-\alpha/2)\).

  • equivalently, if \(p \le \alpha\),

Example 3: Mincer Wage Equation

\[\ln(wage)_i = \beta_0 + \beta_{1} \texttt{educ}_i + \beta_2 \mbox{exp}_i + \beta_3 \mbox{exp}_i^2 + \epsilon_i.\]

  • Marginal effect of experience at given \(10\) years of exp: \(\beta_2 + 20 ~\beta_3 = a^{\prime} \beta\) with \(a =(0,0,1,20)^{\prime}\).

  • Confidence interval on that parameter: \[ \left[ a^{\prime} \widehat\beta_N - c_{1-\frac{\alpha}{2}} \cdot \mbox{s.e.}(a^{\prime} \widehat \beta_N), a^{\prime} \widehat\beta_N+ c_{1-\frac{\alpha}{2}} \cdot \mbox{s.e.}(a^{\prime} \widehat \beta_N) \right].\]

Example 3: Mincer Wage Equation

\[\ln(wage)_i = \beta_0 + \beta_{1} \texttt{educ}_i + \beta_2 \mbox{exp}_i + \beta_3 \mbox{exp}_i^2 + \epsilon_i.\]

  • Average Marginal effect of Experience: \(\beta_2 + 2~\mathbb{E}[\mbox{exp}] ~\beta_3\),
    • cannot be written as \(a^{\prime} \beta\) for any constant \(a\) if \(\mathbb{E}[\mbox{exp}]\) has to be estimated.
    • estimator \(\widehat \beta_2 + 2~ \overline{\mbox{exp}} ~\widehat\beta_3\) is a nonlinear function of \((\hat \beta, \overline{X})\), you analyzed in PS4 & PS5, we will return to this problem.

Example 1: Progresa Reparameterized

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i,\]

Can reparameterize the model, plugging in \(\mbox{Boy}_i = 1- \mbox{Girl}_i\),

\[ Y_{i} = \gamma_0 + \gamma_1 \mbox{Treat}_i + \gamma_2 \mbox{Boy}_i + \gamma_3 \mbox{Treat}_i \times \mbox{Boy}_i + \epsilon_i,\]

  • Effect on girls: \(\beta_1+\beta_3 = \gamma_1\).

  • Estimated effect on girls \(\widehat \beta_1+ \widehat \beta_3 = \widehat \gamma_1\).

  • how to show equivalence?

Example 1: Progresa Reparameterized

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i,\]

Can reparameterize the model, plugging in \(\mbox{Boy}_i = 1- \mbox{Girl}_i\),

\[ Y_{i} = \gamma_0 + \gamma_1 \mbox{Treat}_i + \gamma_2 \mbox{Boy}_i + \gamma_3 \mbox{Treat}_i \times \mbox{Boy}_i + \epsilon_i,\]

  • Effect on girls: \(\beta_1+\beta_3 = \gamma_1\).

  • Estimated effect on girls \(\widehat \beta_1+ \widehat \beta_3 = \widehat \gamma_1\).

  • Estimated effect corresponds to one coefficient in reparameterized model.

Example 1: Progresa Reparameterized

df$boy <- 1- df$girl
reg.2 <- lm(school ~ treat * boy, df) # run regression
reg.test.2 <- coeftest(reg.2,vcov=vcovHC) # heterosckedastic-robust inference
reg.test.2 # heterosckedastic-robust inference

t test of coefficients:

              Estimate Std. Error  t value              Pr(>|t|)    
(Intercept)  0.7844984  0.0078279 100.2182 < 0.00000000000000022 ***
treat        0.0455350  0.0096268   4.7300           0.000002265 ***
boy          0.0251258  0.0107449   2.3384               0.01938 *  
treat:boy   -0.0127312  0.0131906  -0.9652               0.33447    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 1: Progresa Reparameterized

df$boy <- 1- df$girl
reg.2 <- lm(school ~ treat * boy, df) # run regression
reg.test.2 <- coeftest(reg.2,vcov=vcovHC) # heterosckedastic-robust inference
  # heterosckedastic-robust inference
coefci(reg.2,   level=.95, vcov = vcovHC)
                   2.5 %     97.5 %
(Intercept)  0.769154724 0.79984202
treat        0.026665266 0.06440473
boy          0.004064484 0.04618711
treat:boy   -0.038586445 0.01312403

Example 1: Progresa Reparameterized

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i,\]

Other alternative reparameterizations can also lead to one coefficient equaling effect for girls, e.g.,

\[ Y_{i} = \delta_0 + \delta_1 \mbox{Girl}_i + \delta_2 \mbox{Treat}_i*\mbox{Boy}_i+ \delta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i,\] In this reparameterization:

  • CATE boys: \(\beta_1=\delta_2\), and \(\widehat \beta_1=\widehat \delta_2.\)

  • CATE girls: \(\beta_1+\beta_3 = \delta_3\) and \(\widehat \beta_1+\widehat \beta_3 = \widehat \delta_3\)

Example 1: Progresa Reparameterized

reg.3 <- lm(school ~ girl+ treat:boy + treat:girl, df) # run regression
reg.test.3 <- coeftest(reg.3,vcov=vcovHC) # heterosckedastic-robust inference
reg.test.3 # heterosckedastic-robust inference

t test of coefficients:

              Estimate Std. Error  t value              Pr(>|t|)    
(Intercept)  0.8096242  0.0073605 109.9960 < 0.00000000000000022 ***
girl        -0.0251258  0.0107449  -2.3384             0.0193799 *  
treat:boy    0.0328038  0.0090176   3.6378             0.0002759 ***
girl:treat   0.0455350  0.0096268   4.7300           0.000002265 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 2: Reparameterization

\[\begin{multline*} \log C_i = \beta_0 + \beta_1 \log Q_i + \beta_2 \log PL_i +\\ \beta_3 \log PK_i + \beta_4 \log PF_i + \epsilon_i,\end{multline*}\]

\[~H_0 : \beta_2+ \beta_3 + \beta_4 =1, ~~~ \mbox{vs}~~~H_1: \beta_2+ \beta_3 + \beta_4 \ne 1.\]

  • how to reparameterize for one coefficient to equal \(\beta_2+\beta_3 + \beta_4\)?

Example 3: Reparameterization

\[\ln(wage)_i = \beta_0 + \beta_{1} \texttt{educ}_i + \beta_2 \mbox{exp}_i + \beta_3 \mbox{exp}_i^2 + \epsilon_i.\]

  • Marginal effect of experience at given \(10\) years of exp: \(\beta_2 + 20 ~\beta_3\).

  • how to reparameterize for one coefficient to equal \(\beta_2 + 20 ~\beta_3\)?

Multiple Linear Combination of Coefficients

Example 1: Progresa

\[ Y_{i} = \beta_0 + \beta_1 \mbox{Treat}_i + \beta_2 \mbox{Girl}_i + \beta_3 \mbox{Treat}_i \times \mbox{Girl}_i + \epsilon_i.\] Consider testing null of zero average effect for both boys and girls: \[~H_0: \beta_1 =0~ ~ \mbox{and} ~~ \beta_3=0,~~~\mbox{vs.}~~~~H_1: \beta_1 \ne 0~ ~ \mbox{or} ~~ \beta_3\ne 0.\]

  • Joint hypothesis test.
  • How different from multiple hypothesis test?

Consider \[H_0: R\beta=b ~~\mbox{vs}~~H_1: R\beta \neq b,\] where \(R\) is a \(q \times (K+1)\)-dimensional constant matrix and \(b\) is a \(q\times 1\) vector.

Suppose \(\widehat \Sigma_N \stackrel{p}{\rightarrow} \Sigma\). Then

\[\begin{align*}\sqrt{N}&(\widehat\beta_N - \beta) \stackrel{d}{\rightarrow} N(0, \Sigma )\\ & \Rightarrow~~ \sqrt{N} (R \widehat\beta_N- R \beta) \to_d N(0, R\Sigma R')\\ & \Rightarrow~~ N (R \widehat\beta_N- R \beta)^{^\top}(R \widehat \Sigma_N R^{\prime})^{-1} (R \widehat\beta_N- R \beta) \stackrel{d}{\rightarrow} \chi^2_q\end{align*}\]

Let \[ \begin{align*} T_N(b) &= N (R \widehat\beta_N- b)^{^\top}(R \widehat \Sigma_N R^{\prime})^{-1} (R \widehat\beta_N-b)\\ & = (R \widehat\beta_N- b)^{^\top}(R (\widehat \Sigma_N/N) R^{\prime})^{-1} (R \widehat\beta_N-b). \end{align*}\]

\[T_N(b) \stackrel{d} {\rightarrow} \chi^2_q~~~ \mbox{under}~ H_0: R\beta=b\]

\[T_N(b) \stackrel{d} {\rightarrow} \chi^2_q~~~ \mbox{under}~ H_0: R\beta=b\]

  • Reject null at asymptotic level \(\alpha\) if \(T_N > c_{q,1-\alpha}\), where \(c_{q,1-\alpha} = \chi^2_{q,1-\alpha}\), \(1-\alpha\) quantile of \(\chi_q^2\).

  • \(p\)-value is \(1 - F_{\chi^2_q}\!\left(T_N(b)\right)\), invert for the CI.

Ex: Progresa, \(H_0: \beta_1 = 0\), \(\beta_3 = 0\)

reg.1 <- lm(school ~ treat * girl, df) # run regression
V.hc3 <- vcovHC(reg.1) # get heterosckedastic-robust variance-covariance matrix
b <- coef(reg.1)
# Wald statistic: (R b - r)' [R V R']^{-1} (R b - r), with r = 0
Rb <- c(b["treat"], b["treat:girl"])
RVRT <- V.hc3[c("treat", "treat:girl"), c("treat", "treat:girl")]
W <- as.numeric(t(Rb) %*% solve(RVRT) %*% Rb)
pval <- 1 - pchisq(W, df = 2)

Ex: Progresa, \(H_0: \beta_1 = 0\), \(\beta_3 = 0\)

reg.1 <- lm(school ~ treat * girl, df) # run regression
V.hc3 <- vcovHC(reg.1) # get heterosckedastic-robust variance-covariance matrix
b <- coef(reg.1)
b
# Wald statistic: (R b - r)' [R V R']^{-1} (R b - r), with r = 0
Rb <- c(b["treat"], b["treat:girl"])
RVRT <- V.hc3[c("treat", "treat:girl"), c("treat", "treat:girl")]
W <- as.numeric(t(Rb) %*% solve(RVRT) %*% Rb)
pval <- 1 - pchisq(W, df = 2)
(Intercept)       treat        girl  treat:girl 
      0.810       0.033      -0.025       0.013 

Ex: Progresa, \(H_0: \beta_1 = 0\), \(\beta_3 = 0\)

reg.1 <- lm(school ~ treat * girl, df) # run regression
V.hc3 <- vcovHC(reg.1) # get heterosckedastic-robust variance-covariance matrix
b <- coef(reg.1)
# Wald statistic: (R b - r)' [R V R']^{-1} (R b - r), with r = 0
Rb <- c(b["treat"], b["treat:girl"])
RVRT <- V.hc3[c("treat", "treat:girl"), c("treat", "treat:girl")]
Rb
RVRT
W <- as.numeric(t(Rb) %*% solve(RVRT) %*% Rb)
pval <- 1 - pchisq(W, df = 2)
     treat treat:girl 
     0.033      0.013 
               treat treat:girl
treat       0.000081  -0.000081
treat:girl -0.000081   0.000170

Explanation, interpretation of form of variance matrix?

Ex: Progresa, \(H_0: \beta_1 = 0\), \(\beta_3 = 0\)

reg.1 <- lm(school ~ treat * girl, df) # run regression
V.hc3 <- vcovHC(reg.1) # get heterosckedastic-robust variance-covariance matrix
b <- coef(reg.1)
# Wald statistic: (R b - r)' [R V R']^{-1} (R b - r), with r = 0
Rb <- c(b["treat"], b["treat:girl"])
RVRT <- V.hc3[c("treat", "treat:girl"), c("treat", "treat:girl")]
W <- as.numeric(t(Rb) %*% solve(RVRT) %*% Rb)
W
pval <- 1 - pchisq(W, df = 2)
[1] 35.6

Note that, for matrix \(A\), solve(A) inverts matrix \(A\).

Ex: Progresa, \(H_0: \beta_1 = 0\), \(\beta_3 = 0\)

reg.1 <- lm(school ~ treat * girl, df) # run regression
V.hc3 <- vcovHC(reg.1) # get heterosckedastic-robust variance-covariance matrix
b <- coef(reg.1)
# Wald statistic: (R b - r)' [R V R']^{-1} (R b - r), with r = 0
Rb <- c(b["treat"], b["treat:girl"])
RVRT <- V.hc3[c("treat", "treat:girl"), c("treat", "treat:girl")]
W <- as.numeric(t(Rb) %*% solve(RVRT) %*% Rb) # solve( ) gives matrix inverse
pval <- 1 - pchisq(W, df = 2)
pval
[1] 0.000000019

Inverting the Wald Test

The \((1-\alpha)\) confidence set for \(R\beta\) obtained by inverting the Wald test is given by \[\begin{align*} & \mathcal{C}_{1-\alpha}\\ & = \left\{ b \in \mathbb{R}^q : T_N(b) \le \chi^2_{q,1-\alpha} \right\}\\ & = \left\{ b \in \mathbb{R}^q: \right.\\ & \qquad \left. (R\hat\beta - b)^\top \left(R(\widehat{\Sigma}/N)R^\top\right)^{-1} (R\hat\beta - b) \le \chi^2_{q,1-\alpha} \right\}. \end{align*}\] where \(\chi^2_{q,1-\alpha}\) is the \(1-\alpha\) quantile of \(\chi_q^2\). \(\mathcal{C}_{1-\alpha}\) is an ellipsoid centered at \(R\hat\beta\) with shape determined by \(R\widehat{\Sigma}R^\top\).

95% Confidence Set: (\(\beta_1,\beta_3\))

HC3 estimate of \(\widehat{\Sigma}/N\) for \((\beta_1,\beta_3)\)

\(\beta_1\) \(\beta_3\)
\(\beta_1\) 0.000081 -0.000081
\(\beta_3\) -0.000081 0.000174
  • Off-diagonal is large and negative (explanation?)

  • Variances unequal

  • Tilted, elongated ellipse

Inverting Wald test to construct 95% confidence set on \((\beta_1,\beta_3)\).

95% Confidence Set: (\(\beta_1,\beta_1+\beta_3\))

HC3 estimate of \(R(\widehat{\Sigma}/N)R^\top\) for \(R \beta=(\beta_1,\beta_1+\beta_3)\)

\(\beta_1\) \(\beta_1+\beta_3\)
\(\beta_1\) 0.000081 0.000000
\(\beta_1+\beta_3\) 0.000000 0.000093
  • Off-diagonal is zero (explanation?)

  • Variances close to equal

  • Confidence set close to circular

Inverting Wald test to construct 95% confidence set on \((\beta_1,\beta_1+\beta_3)\).

Joint vs Marginal Inference: Ellipse vs Rectangle

  • The 95% Wald confidence set is an ellipse, in this example, an ellipse in \(\mathbb{R}^2.\)

  • If instead we invert two two-sided 5% t-tests, we obtain marginal CIs: \[ \text{CI}_1 = \hat\beta_1 \pm z_{0.975}\,\text{se}(\hat\beta_1), \qquad \text{CI}_3 = \hat\beta_3 \pm z_{0.975}\,\text{se}(\hat\beta_3). \]

  • Their intersection is the rectangle \(\text{CI}_1 \times \text{CI}_3.\)

Key Difference

  • The ellipse is a joint 95% confidence set for the vector \((\beta_1, \beta_3)\).
  • The rectangle is the intersection of two marginal 95% statements.
  • The rectangle is not a 95% joint confidence set for \(\beta_1, \beta_3\).
  • The shapes differ because:
    • Wald uses the full covariance matrix.
    • Marginal t-tests examine coordinates separately.

95% Confidence Set: (\(\beta_1,\beta_3\))

  • Dashed vertical lines: marginal 95% CI for \(\beta_1\) from inverting a two-sided t-test.
  • Dashed horizontal lines: marginal 95% CI for \(\beta_3\) from inverting a two-sided t-test.
  • The dashed rectangle is the intersection of the two marginal CI statements.
  • Acceptance regions differ (ellipse vs rectangle).

95% Confidence Set: (\(\beta_1,\beta_1+\beta_3\))

  • Dashed vertical lines: marginal 95% CI for \(\beta_1\) from inverting a two-sided t-test.
  • Dashed horizontal lines: marginal 95% CI for \(\beta_1+\beta_3\) from inverting a two-sided t-test.
  • The dashed rectangle is the intersection of the two marginal CI statements.
  • Acceptance regions differ (ellipse vs rectangle).